New algorithms for finding approximate frequent item sets
نویسندگان
چکیده
In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However, in many cases this is too strict a requirement that can render it impossible to find certain relevant groups of items. By relaxing the support definition, allowing for some items of a given set to be missing from a transaction, this drawback can be amended. The resulting item sets have been called approximate, fault-tolerant or fuzzy item sets. In this paper we present two new algorithms to find such item sets: the first is an extension of item set mining based on cover similarities and computes and evaluates the subset size occurrence distribution with a scheme that is related to the Eclat algorithm. The second employs a clustering-like approach, in which the distances are derived from the item covers with distance measures for sets or binary vectors and which is initialized with a one-dimensional Sammon projection of the distance matrix. We demonstrate the benefits of our algorithms by applying them to a concept detection task on the 2008/2009 Wikipedia Selection for schools and to the neurobiological task of detecting neuron ensembles in (simulated) parallel spike trains.
منابع مشابه
A Novel Approach for finding Frequent Item Sets with Hybrid Strategies
Frequent item sets mining plays an important role in association rules mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Therefore, a number of methods have been proposed recently to discover approximate frequent item sets. This paper proposes an efficient SMine (Sorted Mine) Algorithm for finding frequent ite...
متن کاملComparison of Frequent Item Set Mining Algorithms
Frequent item sets mining plays an important role in association rules mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. The main focus of this paper is to analyze the implementations of the Frequent item set Mining algorithms such as SMine and Apriori Algorithms. General Terms-Data Mining, Frequent Item sets,...
متن کاملAlgorithm for Efficient Multilevel Association Rule Mining
over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. The problems of finding frequent item sets are basic in multi level association rule mining, fast algorithms for solving problems are needed. This paper presents an efficient version of apriori algorithm for mining multi-level association rules in large databases to fi...
متن کاملAn efficient hash based algorithm for mining closed frequent item sets
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets, and then forming conditional implication rules among them. Efficient algorithms to discover frequent patterns are crucial in data mining research. Finding frequent item sets is computationally the most expensive step i...
متن کاملA Hybrid GeneticMax Algorithm for Improving the Traditional Genetic Based Approach for Mining Maximal Frequent Item Sets
Mining Frequent item sets is one of the most useful data mining methods which discovers important relationships among attributes of data sets. Initially it was developed for market basket analysis, but these days it is used to solve any task where discovering hidden relationships among different attributes is required. Mining frequent item sets plays a vital role for generating association rule...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Soft Comput.
دوره 16 شماره
صفحات -
تاریخ انتشار 2012